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Abstract — Hierarchical cooperation has recently been shown to 
achieve better throughput scaling than classical multihop schemes 
under certain assumptions on the channel model in static wireless 
networks. However, the end-to-end delay of this scheme turns 
out to be significantly larger than those of multihop schemes. 
A modification of the scheme is proposed here that achieves 
a throughput-delay trade-off D(n) — (log n) 2 T(n) for T(n) 
between 0(y / n/logn) and 6(n/logn), where D(n) and T(n) 
are respectively the average delay per bit and the aggregate 
throughput in a network of n nodes. This trade-off complements 
the previous results of El Gamal et al., which show that the 
throughput-delay trade-off for multihop schemes is given by 
D(n) = T(n) where T(n) lies between 6(1) and Q(s/n). 

Meanwhile, the present paper considers the network multiple- 
access problem, which may be of interest in its own right. 

Index Terms — Ad hoc Wireless Networks, Hierarchical Coop- 
eration, Scaling Laws, Throughput-Delay Trade-off. 

I. Introduction 

Scaling laws offer a way of studying fundamental trade-offs 
in wireless networks as well as of highlighting the qualitative 
and architectural properties of specific designs. Such study has 
been initiated by the work [1] of Gupta and Kumar in 2000. 
Their by now familiar model considers n nodes randomly 
distributed on a unit area, each of which wants to communicate 
to a random destination at a common rate R(n). They ask 
what is the maximally achievable scaling of the aggregate 
throughput T(n) = nR(n) and show that cooperation be- 
tween nodes can dramatically improve performance. Instead 
of using the simple scheme of time-sharing between direct 
transmissions from source nodes to destinations, which only 
achieves aggregate throughput 0(1), the nodes can cooperate 
and relay the packets by multihopping from one node to the 
next, in which case an aggregate throughput scaling of 0(y / n) 
is achieved. The price to pay, however, is in terms of delay. 
In the multi-hop scheme, the packets need to be retransmitted 
many times before they reach their actual destinations, which 
results in larger end-to-end delay. More precisely, as shown 
later in [2], [3], in a multi-hop scheme, bits are delivered to 
their destinations in Q(y/n) time-slots on average after they 
leave their source nodes, while the average delay for the simple 
TDMA scheme remains only 0(1). Note that this accounts 
only for on-the-fiight delay; the queuing delay at the source 
node is not considered. 
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Recently, it has been shown in [4] that under certain assump- 
tions on the channel model, a much better throughput scaling 
is achievable in wireless networks than the one achieved by 
classical multi-hop schemes. The authors exhibit a hierarchical 
cooperation scheme that uses distributed MIMO communica- 
tion to achieve aggregate throughput scaling arbitrarily close 
to linear, i.e. Th{n) = 9(n" s + T ) for any integer h > 0. 
The parameter h corresponds to the number of hierarchical 
levels used in the scheme and by increasing h, one can 
get arbitrarily close to linear scaling. A natural question is 
whether there is a price to pay for this superior scaling of the 
throughput. In particular, what is the throughput-delay trade- 
off for hierarchical cooperation? In this paper, we analyze the 
delay performance of the hierarchical cooperation scheme and 
show that the structure suggested in [4] is very suboptimal 
from the delay point of view. We propose a modification of 
the scheme in this paper, that achieves much better delay 
performance for the same throughput. More precisely, we show 
that one important drawback of the scheme in [4], that is not 
immediately clear from the presentation in there, is that it 
uses an extremely large bulk-size, where the bulk-size of a 
scheme refers to the minimum number of bits that should 
be communicated between each source-destination pair under 
this scheme. The bulk-size used by the scheme in [4] scales as 
Bh(n) = <d(n~); in other words, it grows arbitrarily fast as the 
throughput approaches linear scaling. Note that the bulk-size 
immediately imposes a lower bound on the end-to-end delay 
of each communication; even if there is no transmission delay 
from the source node to the destination node, receiving a bulk 
of B(n) bits will take at least Q(B(n)/ logn) channel uses 
for a destination node, since a simple application of the cut-set 
bound upper bounds the rate of reception by (or transmission 
from) a node with log n bits per channel use. 

The basic idea behind the hierarchical cooperation scheme 
in [4] is to first distribute the bits of a source node among 
its neighboring nodes, so that these bits can then be simul- 
taneously transmitted to a group of nodes in the vicinity of 
the destination node. By collecting the observations of the 
receiving nodes to the actual destination node, the destination 
node is able to recover the bits intended for itself. This 
kind of transmission is often referred to as distributed MIMO 
since it resembles the multiple-input-multiple-output transmis- 
sions between a transmitter and receiver pair with multiple 
transmit and receive antennas respectively. The efficiency of 
the distributed MIMO transmission increases with the size 
of (number of nodes contained in) the transmit and receive 
clusters, formed around the source node and the destination 



node respectively. However, if the size of the transmit cluster 
is large, the bulk of data to be communicated by each source 
node has to be large enough to be chopped off and distributed 
among the many nodes in the cluster. Hence, the size of the 
transmit cluster imposes a lower bound on the bulk size that 
needs to be communicated between each source-destination 
pair. Moreover, distributing the bits of the source node before 
the MIMO transmission and collecting the observations to 
the destination node following the MIMO transmission brings 
another traffic requirement. It has been shown in [4] that this 
cooperation traffic can be handled efficiently if decomposed 
into multiple problems of the original kind, i.e., of com- 
municating between n source-destination pairs in a network 
of n nodes and reusing the idea of distributed MIMO. This 
recursion builds a hierarchical architecture that is shown to 
be efficient from throughput point of view. However, since 
distributed MIMO based communication imposes a lower 
bound on the bulk-size, repeating the idea recursively yields 
a scheme with even larger bulk-size. This is the reason why 
the bulk size of the hierarchical cooperation scheme increases 
as Q(n2) with h hierarchical levels. 

In this paper, we suggest a modification of the hierarchi- 
cal cooperation scheme in [4] that handles the problem of 
cooperation more efficiently. In order to do this, we study 
the problem of cooperation more carefully by posing it as 
a network multiple access problem, instead of separating it 
into multiple unicast problems as was originally done in 
[4]. In the network multiple access problem, each of the n 
nodes in the network is interested in conveying independent 
information, say L bits, to each of the other nodes in the 
network. We propose a two-phase hierarchical scheme that 
solves the multiple access problem in <d(n~z~) time-slots for 
any h > 0. Using this scheme for cooperation, the modified 
hierarchical cooperation scheme achieves the same aggregate 
throughput Th(n) — Q(n' K + T ) by using a much smaller 
bulk-size Bh(n) — Th{n). We show that reduced bulk size 
consequently reduces the delay and the modified hierarchical 
cooperation scheme achieves Dh(n) = 9(n). 

We proceed by optimizing scheduling in this scheme to 
further reduce the end-to-end delay. To do this, we need to 
consider a generalized version of the multiple access problem 
where each node in the network is interested in conveying 
independent information to each of the nodes in a subset 
of A{n) nodes, where the A(n) < n nodes are chosen 
uniformly at random among the n nodes in the network. We 
show that this task can be accomplished in 0(^^-n" s + T logn) 
channel uses for any ft > Q if A(n) > n'w. This allows us 
to achieve a throughput delay trade-off of (T(n), D(n)) — 
(n h / logn, n b logn) for any < b < 1. This new result is 
depicted in Figure [T] together with previous results from the 
literature. 

A related line of research (see e.g. [2], [5], [6], [7]) 
is the characterization of the throughput-delay trade-off for 
mobile networks, where nodes move over the duration of 
communication according to a certain mobility pattern. In 
general, mobility schemes achieve an aggregate throughput 
scaling comparable to that of hierarchical cooperation (i.e. up 
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Fig. 1. Throughput-delay performance achieved by hierarchical cooperation 
together with known results from the literature. 



to linear in n), but the delay scaling performance of such 
schemes may vary significantly, depending on the chosen 
mobility model. For instance, under the classical random walk 
mobility model considered in [2], the performance is quite 
poor, as illustrated in Figure [T] But from the delay point of 
view, a more prominent disadvantage which is common to all 
mobility models and which does not appear on the graph in 
Figure Q] is the constant that precedes the delay scaling law. 
Roughly speaking, this pre-constant relates to the speed of 
nodes in the case of mobility schemes, whereas it relates to 
the speed of light in the case of hierarchical cooperation. 

II. Setting and Main Results 

There are n nodes uniformly and independently distributed 
in a square of unit area. Every node is both a source and a 
destination. The sources and destinations are paired up one- 
to-one in a random fashion without any consideration on 
respective locations. Each source has the same traffic rate R(n) 
to send to its destination node. (In the following text, we will 
sometimes refer to this traffic pattern as the unicast problem in 
order to distinguish it from the multicast problems that will be 
discussed in Sections HVl and IV-BI ) The aggregate throughput 
of the system is T(n) = nR(n). 

The complex baseband-equivalent channel gain between 
node i and node k at time to is given by: 



H ik [m] = r ih a/2 exp(j6i lfc [m]) 



(1) 



where r,fc is the distance between the nodes, 9ik[m] is the 
random phase at time to, uniformly distributed in [0, 27r] and 
{#ifc[m], 1 < i < n, 1 < fc < n} is a collection of i.i.d. random 
processes. The 0ik\m\s and the r^.'s are also assumed to be 
independent. The constant a > 2 is called the power path loss 
exponent of the environment. 

Note that the channel is random, depending on the location 
of the users and the phases. The locations are assumed to be 
fixed over the duration of the communication. The phases are 
assumed to vary in a stationary ergodic manner (fast fading). 
We assume that the channel gains are known at all the nodes. 



2 



The signal received by node i at time m is given by 

n 

Yi[m] = H ik [m]X k [m] + Z^m] 

k=l 

where Xj. [m] is the signal sent by node k at time m and Zi [m] 
is white circularly symmetric Gaussian noise of variance 
Nq per symbol. Every node is subject to a transmit power 
constraint that we denote by pQ 

The delay D(n) of a communication scheme for this net- 
work is defined as the average time it takes for a bit, or packet 
of constant size, to reach its destination node after it leaves 
its source node, where the average is taken over all bits or 
packets traveling in the network. So defined, the delay of a 
scheme quantifies the average time spent by the bits traveling 
inside the network while operated under this scheme. 

This definition of delay is consistent with [2], [3] and 
therefore the comparison in Figure Q] of the multihop scheme 
and hierarchical cooperation is fair. However, note that this 
definition does not include the queuing delay at the source 
node, as the clock starts when a packet leaves its source 
node. The delay at the source node can be accounted for by 
assuming a particular packet arrival process and studying the 
overall delay of a packet from its arrival at the source queue to 
the decoding at the destination node. The transmission delays 
given in Figure Q] can be regarded as lower bounds to this 
overall delay. Consider for example the simple TDMA scheme, 
one at a time transmission between the source-destination 
pairs, that corresponds to the origin in Figure Q] Assume 
independent Poisson packet arrival at each source node of 
appropriate rate. If we assume round-robin fashion, backlog 
unaware scheduling between the transmissions, the overall 
delay of the TDMA scheme will be 8(rt) much larger than 
the 0(1) delay predicted by Figure Q] However, it is known 
that this delay can be reduced to O(logrt) with backlog aware 
scheduling [8]. In general, how larger is the overall delay from 
the transmission delay given in Figure Q] depends on how 
well we can match the packet arrival process with backlog 
aware scheduling schemes. In this paper, our aim is to quantify 
the transmission delay of the discussed schemes; the second 
question regarding the queuing delay at the source is left open. 

The following theorem is the main result of this paper. 
Theorem 2.1: Using hierarchical cooperation, the following 
points are achievable on the throughput-delay scaling curve, 

(T(n),D(n)) = 6 {n b / logn,n b logn) 

where < b < 1 (see Figure [TJ. 

III. Overview of the Hierarchical Cooperation 
Scheme 

In this section, we give a brief overview of the hierar- 
chical cooperation scheme as presented in [4] and establish 

'We present the low-level assumptions on the channel and network model 
in this section for the sake of completeness. However, most of the discussions 
in the following sections will rely on intermediate results established in [4], 
hence the dependence of the results on the low level assumptions might not 
be always clear. 



the throughput-delay trade-off for this scheme. Some of the 
discussions presented here directly build on results already 
established in [4]. 

The hierarchical cooperation scheme is based on clustering 
the nodes in the network and performing long-range MIMO 
transmissions between the clusters. The long-range MIMO 
transmissions should be proceeded and followed by coop- 
eration phases establishing transmit and receive cooperation 
respectively, which yields three successive phases in the oper- 
ation of the network. If simple TDMA is used for establishing 
cooperation in phase 1 and 3, the overall scheme achieves a 
-^/n-scaling of the aggregate throughput. This is the three phase 
scheme discussed in Section ITlI-AI Higher throughputs can be 
achieved by setting the cooperation problem as multiple com- 
munication problems and using the three phase scheme as a 
solution to each of those communication problems. This yields 
the idea of recursion and results in a hierarchical architecture, 
where increasing the number of levels in the hierarchy yields 
an aggregate throughput scaling arbitrarily close to linear. The 
hierarchical cooperation scheme is discussed in more detail in 
Section ITlFBl 



A. The Three Phase Scheme 

The network is divided into clusters of Mi nodes and a 
particular source node s sends M\ bits to its destination node 
d in three steps: 

(1) Node s first distributes its Mi bits among the M% nodes 
in its cluster, one bit for each node; 

(2) These nodes together can then form a distributed transmit 
antenna array, sending the Mi bits simultaneously to the 
destination cluster where d lies; 

(3) Each node in the destination cluster gets one observation 
from the MIMO transmission, and it quantizes and ships 
the observation to d, which can then do joint MIMO 
processing of all the observations and decode the Mi 
transmitted bits. 

From the network point of view, all source-destination pairs 
have to eventually accomplish these three steps. Step 2 is long- 
range communication and only one source-destination pair can 
operate at a time. Steps 1 and 3 involve local communica- 
tion and can be parallelized across source-destination pairs. 
Combining all this leads to the following three phases in the 
operation of the network: 

Phase 1: Setting Up Transmit Cooperation Clusters work 
in parallel. Within a cluster, each source node distributes Mi 
bits to the other nodes, 1 bit for each node, such that at the 
end of the phase, each node has 1 bit from each of the other 
nodes in its cluster. (Recall our assumption that each node is a 
source for some communication request and a destination for 
another.) Thus, since there are M% source nodes in each cluster, 
this gives a traffic demand of exchanging Mi (Mi — 1) ~ Ml 
bits. Using TDMA, one-at-a-time transmission between pairs 
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of nodes, these M\ bits can be exchanged in M\ time slots H 
Phase 2: MIMO Transmissions Successive long- 
distance MIMO transmissions are performed between source- 
destination pairs, one at a time. In each one of the MIMO 
transmissions, say the one between s and d, the Mi bits of s 
are simultaneously transmitted by the Mi nodes in its cluster 
to the Mi nodes in the cluster of d. Each of the long-distance 
MIMO transmissions are repeated for each source-destination 
pair in the network, hence we need n time-slots to complete 
the phase. 

Phase 3: Cooperate to Decode Clusters work in parallel. 
Since there are Mi destination nodes inside the clusters, each 
cluster received Mi MIMO transmissions in phase 2, one 
intended for each of the destination nodes in the cluster. Thus, 
each node in the cluster has Mi received observations, one 
from each of the MIMO transmissions, and each observation 
is to be conveyed to a different node in its cluster. Nodes 
quantize each observation into fixed Q bits, so there are a 
total of QM\ bits to be exchanged inside each cluster. Using 
TDM A as in Phase 1, the phase can be completed in QMf 
time slots 

In [4], it is shown that each destination node is able to 
decode the transmitted bits from its source node from the 
All quantized signals it gathers by the end of Phase 3. 
The throughput achieved by the scheme can be calculated as 
follows: each source node is able to transmit Mi bits to its 
destination node, hence nMi bits in total are delivered to their 
destinations in Mf+n+QM^ time slots, yielding an aggregate 
throughput of 

nMi 
Ml + n + QM\ 

bits per time-slot. Choosing Mi = y/n to maximize this 
expression yields an aggregate throughput T(n) = j^qV™- 

Note that as opposed to multihop, this three phase scheme 
allows only bulk transmission between any source-destination 

2 Note that although because of the broadcasting nature of TDMA, every 
bit of a source node can be conveyed to all other nodes in the cluster for 
free, this is not what we require here. In the MIMO transmissions that are 
following in the next phase, every node is independently encoding its data and 
it does not need to know the bits transmitted by the other nodes. A standard 
reference on the capacity of MIMO channels is [10]. The derivations for the 
current case can be found in [4]. 

3 In order to be able to convey the salient features of the hierarchical 
cooperation scheme to the reader in the simplest way, a rather informal 
approach is taken in this section and some technical details are omitted. A 
rigorous description of the scheme can be found in [4]. For example, it is not 
necessary to have exactly M\ nodes in each cluster but it suffices to have 
Q(Mi) nodes for a scaling law analysis. It is shown in [4] that by dividing 
the network into cells of certain area, we can ensure having O(Afi) nodes 
in all cells with high probability. Moreover, the case when a source node and 
its destination lie in the same cluster should be treated separately. Similarly, 
assuming that each source node is sending exactly 1 bit to each of the other 
nodes in its cluster in phase 1 is a simplification. A rigorous argument will 
assume that each source node is sending L bits to each of the other nodes 
where L is a large enough constant independent of M\ and n. The rates of 
the TDMA transmissions in phase 1 and phase 3 and the per node rate for the 
MIMO transmission in phase 2 are assumed to be 1 for simplicity, so that 1 
bit is transmitted in 1 time slot. The actual rates of these transmissions can be 
shown to be constants depending on the system parameters and independent 
of Mi and n. Also, in phase 1 and phase 3 not all clusters should be allowed 
to operate simultaneously but a TDMA scheme between the clusters should 
be employed so that the resultant inter-cluster interference is bounded and 
each cluster becomes active a constant fraction of time. 



pair in the network; i.e. one can not arbitrarily communicate 
one bit (or L bits with L constant) using the three-phase 
scheme, but Mi = ^fri, bits should be transferred between 
every source-destination pair with each use of the scheme. 

The end-to-end delay of this scheme is simply the total time 
for the three phases, since the bits are leaving their source 
nodes at the beginning of the first phase and are only decoded 
by their respective destination nodes at the end of the third 
phase. With the choice Mi = ^/n, we see that the delay of 
the three phase scheme is D(n) = (2 + Q)n. Note that this 
delay scaling is much worse when compared to the delay of 
the multi-hop scheme achieving same aggregate throughput. 

B. The Hierarchical Cooperation Scheme 

Higher aggregate throughput scaling can be achieved by 
using better network communication schemes than TDMA 
to establish the transmit and receive cooperations in the first 
and third phases of the three phase scheme described in the 
previous section. Recall that there are Mj 2 and QM^ bits to be 
exchanged inside each cluster in phases 1 and 3, respectively. 
This traffic demand of exchanging Mf bits (or QM\ bits) 
can be handled by setting up Mi sub-phases, and assigning 
Mi pairs in each sub-phase to communicate their 1 bit (or Q 
bits). The traffic to be handled at each sub-phase now looks 
similar to the original network communication problem (the 
unicast network problem defined in Section HD, with Mi users 
instead of n. Any scheme suggesting a good solution for the 
original problem can now be used inside the sub-phases as 
an alternative to TDMA; for example, the multi-hop scheme 
and the three-phase scheme itself would be two alternatives 
both achieving an aggregate throughput scaling 9(\/Mi) (in 
a network of size Mi) as opposed to the 0(1) scaling achieved 
by TDMA. 

Consider using the three phase scheme for cooperation as 
suggested in [4]. More precisely, we want to handle the traffic 
of communicating 1 bit (or Q bits) between the Mi pairs 
assigned in each sub-phase of phase 1 (or phase 3), by further 
dividing the clusters into smaller clusters of size Mi and 
reusing the three phase scheme (TDM A-MIMO-TDMA) . Note 
that this will create a hierarchical structure with two levels. 
See Figure [2] Note however that the three phase scheme in 
Section ITlI-AI allows only bulk transmissions between source- 
destination pairs. In this particular case, one will have to 
communicate Mi bits between the source-destination pairs 
assigned at each sub-phase, as opposed to the original require- 
ment of communicating only 1 bit (or Q bits). For the overall 
scheme, this in turn increases the bulk size to be communicated 
between every source-destination pair in the network from Mi 
bits to Mi x Mi bits. So for the 2-level hierarchical scheme, we 
have to start by assuming that each source node in the network 
has Mi x Mi bits to communicate to its destination node. It 
can be seen that these Mi x Mi bits per source destination 
pair, or a total n x Mi x Mi bits in the network, can be 
communicated in 

Mi(Mf + M X +QMf ) + M 2 ri+M 1 Q(M| + M X +QMf ) (2) 

time slots using the 2-level hierarchical scheme. The first term 
Mi(M| + Mi + QMf) is the completion time of phase-1 
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of the three phase scheme. It is divided into Mi sessions; 
in each session, Mi source-destination pairs are assigned to 
communicate their M 2 bits using a three phase scheme of 
TDMA-MIMO-TDMA. Recall from the computations of the 
three phase scheme in Section [TT] that this takes M| + Mi + 
QMf time slots (Mi and M2 here correspond to the n and 
Mi, respectively, of the previous section). A similar argument 
holds for the third term MiQ(M% + M x + QM%) in © which 
is the completion time for phase-3 with the extra Q factor. 
Note that at the end of the first phase, each source node has 
distributed its Mi x M2 bits among the Ml nodes in its cluster, 
hence M2 bits for each node. These bits can be relayed to 
the destination cluster in M2 successive MIMO transmissions. 
Since the MIMO transmissions have to be repeated for each of 
the n source-destination pairs in the network, the completion 
time of the second phase is M 2 n in ©. 

Therefore, the aggregate throughput of the 2-level scheme 
is given by the expression 

M 2 Mi n 

Mi(M| + Mi + QMf) + M 2 n + MiQ(M 2 + Mi + QMf) 

(3) 

and the optimal choices of Mi = n 2 / 3 and M 2 = n 1 / 3 
maximize the aggregate throughput scaling to 

T 2 {n) = Mi = n 2/3 , 

while the denominator dictating the delay of the scheme is of 
order 

D 2 (n) = M 2 xn = n 4/3 . 

Note that with the 2-level hierarchical scheme, we improve 
the aggregate throughput scaling from y/n for the three-phase 
scheme in the previous section to n 2 / 3 , at the cost of increasing 
the bulksize from y/n to n, which, in turn, increases the delay 
from n to n 4 / 3 . 

The argument can be applied recursively to build an h- 
level hierarchical scheme. The optimal cluster size at the fc'th 
level of an ft,-level hierarchical scheme can be computed as 
Mfc = n M-i . The aggregate throughput achieved by an h- 
level hierarchical cooperation scheme is given by 

T h (n) = Mi = n^Ti . 

The bulk-size is 

B h (n) = M h x ... x Mi = 
and the end-to-end delay is 

D h (n) = M h x M h _i X • • • X M 2 X n = n \^ltf 

where we observe that for large h, the delay exponent is linear 
in h. 

The results obtained in this section establish the poor delay 
performance of hierarchical cooperation. Note that the delay 
is mostly due to the large bulk-size used by the scheme. This 
is different from multi-hop schemes since their bulk-size is 
constant (0(1)) independent of the throughput achieved. The 
delay in multihop is rather due to the time spent in relaying the 
messages inside the network. In the next section, we suggest 
a modification of the scheme so that it achieves the same 
throughput using much smaller bulk-size. 



IV. Hierarchical Cooperation with Smaller 
Bulk-Size 

In this section, we treat the problem of cooperation in the 
three phase scheme with more care. We start by defining the 
network multiple access problem to be the following. 

Definition 4.1 (The Network Multiple Access Problem): 
Consider the assumptions on the network and channel model 
given in Section lU Let each node in the network be interested 
in communicating independent information to each of the 
other nodes in the network. In particular, let us assume 
that each node has an independent 1 bit message (or L 
independent bits, with L constant) to send to each of the 
other nodes in the network and the quantity of interest is the 
smallest time F(n) required to accomplish this task. This 
problem we refer to be the network multiple access problem. 

The following theorem provides an achievable solution to 
this problem. 

Theorem 4.1: For any integer h > 0, the network MAC 
problem can be solved in 

. . h+l 

F{n) < Kn-r- 

time-slots w.h.pQ, for some constant K > independent of 
n. 

Proof of Theorem \4.1\ Let us start by assuming that there 
exists a scheme that solves the multiple access problem in 
F(n) — n b time-slots with b > 1. Note that one such scheme 
is simple TDMA that yields 6 = 2. Using this existing scheme, 
we will construct a new scheme that yields smaller F(n). 

As before, let us start by dividing the network into clusters 
of M nodes. Let us first focus on one specific cluster S and one 
node d located outside of this cluster. In particular, all nodes 
in S have 1 bit to send to d. These bits can be communicated 
to d in two steps: 

(1) The nodes in 5 simultaneously transmit their 1 bit 
messages destined to d forming a distributed transmit 
antenna array for MIMO transmission. The nodes in the 
destination cluster where d lies, form a distributed receive 
antenna array for this MIMO transmission. 

(2) Each node in the destination cluster obtains one observa- 
tion from the MIMO transmission in the previous phase; 
it quantizes and ships this observation to d, which can 
do joint MIMO processing of all the observations and 
decode the M transmitted bits from the nodes in S. 

As a first step towards handling the whole network problem, 
note that these two steps should be accomplished between S 
and all other nodes in the network. This can again be done in 
two steps: 

Phase 1: MIMO transmissions We perform successive 
long-distance MIMO transmissions between S and all other 
nodes in the network. In each of the MIMO transmissions, 
say between S and d, the M nodes in S are simultaneously 
transmitting the 1 bit messages they would like to commu- 
nicate to d and the M nodes in the cluster where d lies are 
observing the MIMO transmission. The MIMO transmissions 

4 with high probability 



5 




PHASE 1 


PHASE 2 


PHASE 3 


PHASE 1 


PHASE 2 


PHASE 3 




PHASE 1 


PHASE 2 


PHASE 3 


. 1 ' 1 


■ 




1 1 




■ 1 



Fig. 2. The salient features of the three phases and the time division in a hierarchical scheme are illustrated. Figure taken from [4]. 



should be repeated for each node in the network, hence we 
need n time-slots to complete the phase. 

Phase 2: Cooperate to decode Clusters work in parallel. 
Since there are M nodes inside each cluster, each cluster 
received M MIMO transmissions from S in the previous 
phase, one intended for each node in the cluster. Thus, each 
node in the cluster has M observations, one from each of the 
MIMO transmissions, and each observation is intended for a 
different node in the cluster. Each of these observations can 
be quantized into Q bits, with a fixed Q, which yields exactly 
the original network multiple access problem, with M nodes 
instead of n. Using the scheme we assumed to exist in the 
beginning of the proof, this task can be completed in QM b 
time slots. 

The total time we have spent during the two phases for 
handling the traffic originated from cluster S is given by n + 
QM b . From the network point of view, the above two steps 
should be completed for all n/M clusters in the network. Thus, 
the multicasting task can be completed in jj(n + QM h ) time 
slots in total. Choosing M = nJ in order to minimize this 
quantity yields F(n) = (1 + Q)n 2 ~b. 

Note that 2 — i < b for b > 1, In other words, we have 
established a solution for the multiple access problem that is 
better than the one we started with. Indeed, the two phase 
scheme described above can be used recursively yielding a 
better scheme at each step of the recursion. In particular, 
starting with TDMA achieving 6 = 2 and applying the idea 
recursively h times, one gets a scheme that solves the multiple 
access problem in Q(n^~) time slots. The operation of this 
scheme is illustrated in Figure [3] □ 

The interest in the multiple access problem arises from the 
fact that it exactly models the required traffic for cooperation in 
the three phase scheme. Recall the communication requirement 
inside the clusters in Phase 1 and 3 described in Section ITlI-AI 
This communication requirement, equivalent to a network 
multiple access problem, is handled using TDMA in the 
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Fig. 3. The figure illustrates the time-division in the hierarchical scheme 
that solves the network multiple access problem. 



three phase scheme which has been seen to be suboptimal 
from throughput point of view in the Section IIII-AI In the 
hierarchical cooperation scheme described in Section IIII-BI 
this multiple access problem is handled by decomposing it into 
a number of unicast network problems. The resultant scheme 
is optimal in terms of throughput, but not very satisfying 
in terms of bulk-size. By using the solution to the multiple 
access problem suggested in this section, one can modify the 
hierarchical cooperation scheme, so as to achieve the same 
throughput with smaller bulk-size and consequently smaller 
delay. The resultant modified hierarchical scheme is illustrated 
in Figure [4] Note that the gain is coming from treating the 
cooperation problem as it is and not necessarily as multiple 
unicast problems as was previously done in Section UlI-BI 

Corollary 4.1: A modified hierarchical cooperation scheme 

h 

can achieve an aggregate throughput Th{n) > Kin h + 1 with 
bulk-size B^n) = K^n*^ an d delay D^n) < K 3 n w.h.p., 
for any integer h > and some positive constants Kx, K 2 , K 3 
independent of n. 



Proof of Corollary \4.1\ Consider the three phase hierarchical 
scheme described in Section IIII-AI By Theorem 14.11 the 
required traffic for transmit and receive cooperation in phase 
1 and phase 3 can be handled in KM~*~ and KQM~h~ time 
slots respectively. The expression for the aggregate throughput 
then becomes 



Mn 



KM~ 



KQM 
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Fig. 4. The figure illustrates the time-division in the modified hierarchical scheme that uses the scheme in Figure [5] for cooperation. Note the difference in 
operation of the phases between the modified hierarchical cooperation scheme and the original hierarchical cooperation scheme of [4] in Figure [3] 



which is maximized by the choice M = n h + 1 , yielding aggre- 
gate throughput Th{n) = 1+K \_ kq 11,1 ^ ' bulk-size Bh(n) = 
n ^tt and delay D h (n) = (l + K + KQ)n. □ 

V. Hierarchical Cooperation with Better 
Scheduling 

In the previous section, we presented a modified hierarchical 
scheme that achieves throughput T\(n) = 8(ri 7r + T ) using 
bulk-size Bh(n) — 0(n 7r + T ). However, the delay of this 
scheme is still Dh(n) = Q(n). In this section, we optimize 
the scheduling in the scheme to further improve the delay 
performance to Du(n) = 9(n" s + T logn). We first start by 
improving the scheduling in the three phase scheme with h = 
1 discussed in Section IIII-AI We then consider the modified 
hierarchical scheme with h > 2 discussed in Section [TV] . 

Before starting, we state the following binning lemma, 
similar in spirit to Lemma 4.1 and Lemma 5.1 in [4] and 
can be proven using similar techniques. The lemma will be 
used repeatedly throughout the rest of the paper. 

Lemma 5.1: Let us assume that f(n) balls are thrown into 
n bins, independently and uniformly at random. The following 
properties are satisfied with failure probability exponentially 
small in n. 

(a) If limn.+oo = oo, then there are Q(i^-) nodes in 
each bin. 

(b) If liirin^oo = c w ith c > a constant independent 
of n, then there are at most O(logn) nodes in each bin. 

A. Better Scheduling for the Three Phase Scheme 

Recall the operation of the three phase scheme from the 
point of view of a single source-destination pair s-d as 
described in Section UlI-AI a step (1) where s distributes its 
M bits among the M nodes in its cluster, followed by a 
step (2) where these M bits are simultaneously transmitted 
to the destination cluster via MIMO transmission, and a step 
(3) where the quantized MIMO observations are collected at 
the destination node d. These three steps need to be eventually 
accomplished for each source-destination pair in the network. 
In this section, we improve the scheduling in accomplishing 
this task: we organize M successive sessions and allow only 
n/M source-destination pairs to complete the three steps in 
each session. 

In the beginning of each session we randomly choose one 
source node from each cluster, thus n/M source nodes in 
total. In general, the n/M destination nodes corresponding to 
these randomly chosen source nodes can be located anywhere. 
However, from Lemma 15.11 we know that no more than log n 



of these destination nodes are located in the same cluster with 
high probability. We proceed by accomplishing the three steps 
for these chosen source-destination pairs: 

Phase 1: Setting Up Transmit Cooperation Clusters work 
in parallel. The chosen source node in each cluster distributes 
its M bits to the other nodes by using TDMA, which takes 
M time-slots in total. Note that as opposed to the scheme 
described in Section IIII-AI there is only one source node 
operating in each cluster. 

Phase 2: MIMO Transmissions Successive MIMO trans- 
missions originated from each cluster are performed, transmit- 
ting the bits of the active source node in each cluster to its 
respective destination cluster. Note that in the current case, 
there is only one MIMO transmission originated from each 
cluster, so there are only n/M MIMO transmissions that need 
to be performed in total. This will require total time n/M. 

Phase 3: Cooperate to Decode Clusters work in parallel. 
Each cluster received at most logn MIMO transmissions in 
phase 2 by Lemma IBTTT -b, each MIMO transmission intended 
for a different destination node in the cluster. The received 
observations at each node are quantized into Q bits and are to 
be conveyed to the actual destination nodes. The traffic inside 
each cluster is at most of exchanging QM log n bits and can 
be completed using TDMA in at most QMlogn time slots. 
(See Figure [5]) 

The operation continues with the next session by choosing 
a new set of n/M source nodes randomly among the nodes 
that have not yet accomplished the above steps. Note that all 
source-destination pairs will accomplish the three steps in a 
total of M sessions. 

With this rather smoother operation on the network level, 
we accomplish to serve n/M source-destination pairs in each 
session, that is transfer Mx-fe bits in total to their destinations 



in M- 



jj+QM\ogn time slots yielding aggregate throughput 

Mx TT 

(4) 



M - 



M 



QM logn 



which is maximized by the choice M = \fn yielding ag- 
gregate throughput T(n) = 5Tq k>g^r- ^ e delay experienced 
by each bit is now much less compared to the three phase 
scheme in Section IIII-AI since it is again dictated by the total 
time spent in the three phases (denominator of which is 
now less than D(n) = (2 + Q)y/nlogn. 

Note that instead of choosing M = \fn, which is the 
optimal choice to maximize the throughput achieved by the 
scheme, one can choose M — n h with < b < 1/2. In this 
case, we also restrict the number of source-destination pairs 
to be served in each session to M, which can be less than the 
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Fig. 5. The three phase scheme with better scheduling. The figure illustrates the operation in one session. 



total number of clusters n/M. Indeed, we operate one source 
node in each of the M(< n/M) clusters and simply keep the 
remaining clusters inactive. The expression for the aggregate 
throughput becomes 

M x M 
M + M + QMlogn 

which implies that the scheme achieves aggregate throughput 
T(n) = n b ' j logn and delay D(n) = n b logn for any < b < 
1/2. Note that this throughput-delay trade-off differs only by 
logn from the trade-off achieved by multi-hop schemes. 

B. Better Scheduling for the Hierarchical Cooperation Scheme 

In this section, we adopt the scheduling idea of Section [V-AI 
to the modified hierarchical scheme presented in Section [IV] 
However, this modification is not trivial and requires us to 
consider a generalized version of the network multiple access 
problem. 

Definition 5.1 (The Generalized Network MAC Problem): 
Consider the assumptions on the network and channel model 
given in Section [TT] Let each of the n nodes in the network be 
interested in conveying independent information to a subset 
A(n) of the nodes (A(n) < n), where the A(n) nodes are 
chosen randomly among the n nodes in the network. In 
particular, let us assume that each node in the network has 
an independent 1 bit message (or L independent bits, with 
L constant) to send to each of these A(n) nodes and the 
quantity of interest is the minimal time G(n) required to 
accomplish this task. We define this to be the generalized 
network multiple access problem. 

The following theorem provides an achievable solution to 
this problem. We skip the proof of the theorem since it is 
similar in spirit to the proof of Theorem 14.11 

Theorem 5.1: For any integer h > 0, if A(n) > n 7 ^ , then 
the network multiple access problem can be solved in 

G(n) < K ^ n^ log(n) 
n 

time-slots w.h.p., for some constant K > independent of n. 

Note that the generalized network multiple access problem 
contains the network multiple access problem discussed earlier 
as a special case with A(n) = n. Plugging A(n) = n in 
Theorem 15.11 we recover the result of Theorem 14.11 with an 
extra logn factor. Indeed, when the condition A(n) > n 7 ^ is 
satisfied with strict inequality in order, the extra log n factor in 



Theorem 15. II is not needed. However, it is needed to account 
for the case A(n) = n 7 ^ 7 , in which case it arises due to part-b 
of Lemma 15.11 

We are now ready to apply the scheduling idea in Sec- 
tion IV-AI to the hierarchical cooperation scheme. Consider 
dividing the network into clusters of Mi nodes and then 
further divide these clusters into smaller clusters of size M^. 
Following the scheduling idea in Section [V-Al let us organize 
M1/M2 sessions and for each session randomly choose one 
small cluster inside every large cluster. Only the source nodes 
located in the chosen small clusters and their corresponding 
destination nodes will be served at each session. As usual, we 
are operating in three successive phases in each session: 

Phase 1: Setting Up Transmit Cooperation The active 
small clusters operate in parallel. Note that there is a single 
active cluster of size M2 inside every large cluster of size Mi. 
Let S be the chosen small cluster inside the larger cluster L 
that will operate in the current session. In this phase, each 
of the Ma source nodes in S need to distribute their Mi bits 
among the Mi nodes in the larger cluster L, each of the Mi 
bits goes to a different node. This can be accomplished in two 
sub-phases: 

• Sub-Phase 1: MIMO transmissions Successive MIMO 
transmissions are performed between nodes in 5 and each 
node in L. In each of these MIMO transmissions, say the 
one between S and a node d in L (located outside of 
S), the M2 nodes in S are simultaneously transmitting 
the 1 bit messages they would like to communicate to d. 
The M2 nodes located in the same small cluster with d 
are acting as a distributed receive antenna array for this 
MIMO transmission. Since these MIMO transmissions 
should be repeated for every node in L, this sub-phase 
takes a total of Mi time-slots. See Figure [6] 

• Sub-Phase 2: Cooperate to Decode All small clusters 
in the network work in parallel. In particular, each small 
cluster in L has received AI2 MIMO transmissions from 
S in the previous phase, one MIMO transmission for each 
node in this small cluster. Thus, each node in the small 
cluster has M2 observations, one from each of the MIMO 
transmissions and each observation is to be conveyed to a 
different node in the cluster. Quantizing each observation 
into Q bits, we get the network multiple access problem 

defined in Section [TV] in a network of size AI2, and by 

jti+j 

Theorem 14.11 this problem can be handled in QM 2 1 
time-slots for any integer hi > 0. 




Fig. 6. The figure illustrates sub-phase 1 of phase 1 of the modified 
hierarchical scheme with better scheduling. Note that there is only one small 
cluster distributing bits inside every large cluster. 

Phase 2: MIMO Transmissions At the end of the first 
phase, all source nodes in the active small clusters have 
distributed their Mi bits among the nodes in the larger cluster. 
Now, successive long-distance Mi x Mi MIMO transmissions 
between large clusters are performed. During each MIMO 
transmission, the Mi bits of a particular source node in the 
active small cluster are transferred to the destination cluster 
where its destination node is located. The number of MIMO 
transmissions to be performed in this phase is equal to the 
total number of source nodes active in this session. Hence the 
total phase can be completed in jj- x M2 time-slots. 

Phase 3: Cooperate to Decode By part-a of Lemma 15.11 
there are order Ma destination nodes located in each of the 
large clusters. Thus, each large cluster has received M 2 MIMO 
transmissions in the previous phase, and the quantized MIMO 
observations spread over the Mi nodes of the large cluster 
should be collected at the corresponding M2 destination nodes. 
This is the generalized network multiple access problem of 
size Mi with A(Mi) = M 2 . By Theorem l5.ll it can be solved 

in Qj^- x M 1 h2 log Mi time-slots for any integer /i 2 > 

provided that A(M 1 ) > M^ 1 . 

Gathering everything together, at every session of this 
modified hierarchical cooperation scheme, we deliver Mi x 
Mo x -=5- bits to their destinations in a total of 

Ivl 1 

f Mj + QM 2 hl j + — x M 2 + Qj^ x M[ h2 log Mi 

time-slots. The aggregate throughput is given by 

tSt x M 2 x Mi 

Mi + QM^ + -fc x M 2 + Qjfe x log Mi 

which is maximized by the choice h = h 2 = hi + 1, 

h k=l 
Mi = n^+i and M 2 = M, h , yielding aggregate throughput 

h h 

T(n) — " - and delay D(n) — n 7 ^ logn. Note that these 
choices for Mi and M 2 satisfy the constraint A(Mi) = M 2 > 

h 2 

Note that at this point, we have proven that all points 
on the throughput-delay scaling curve (T(n),D(n)) = 
(n' K + I / logn, n' K + I logn) with h being a positive integer 
are achievable. In order to show that all points on the line 

(T(n),D(n)) = (n b / logn, n b logn) with < b < 1 are 



achievable, we can choose Mi = n b with < b < jptr 
in the above discussion, while maintaining the relationships 

h-l 

M 2 = Mj h and h = h% = hi + 1. Extending the argument 
at the end of Section IV-AI we also restrict the number of 
small clusters to be served in each session to M^ h which 
can now be less than the total number of large clusters 
n/Mi ( > Mi' h ). Indeed, we operate one small cluster in each 
of the M^ h large clusters and simply keep the remaining large 
clusters inactive. The expression for the aggregate throughput 
becomes 

x M 2 x Mi 
jn+j I h 2 +i 

Mi + QM 2 hl + x M 2 + Q|| x M x h2 log Mi 

which shows that we can achieve aggregate throughput 
T(n) = Mi/logMi and delay D(n) = Mi logMi. Recalling 
that Mi = n b , we get the points on the throughput-delay 
scaling curve (T(n), D(n)) = (n b /logn, n b logn) for any 
< b < and h > 0. This concludes the proof of the 
main result of this paper. □ 



VI. Conclusion 

The present work shows that hierarchical cooperation not 
only can lead to higher throughput in ad hoc networks, but 
also to reasonable end-to-end delay, given that some extra 
care is taken in setting up cooperation at the lower levels and 
scheduling communications. Meanwhile, we have discussed 
the network multiple-access problem in the present paper, 
which is of interest in its own right. 
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